Concepts

Byte

  • Is the smallest addressable unit of memory on a system.

Size

  • A byte  is not always the same as a u8  (unsigned 8-bit integer), although they are often treated that way in modern systems.

    • On almost all modern hardware:

      • 1 byte = 8 bits

    • So a byte happens to match a u8

    • That’s why in practice people often treat them as equivalent.

  • Its size is defined by the architecture, not the language.

  • 6-bit byte systems :

    • IBM 1401

    • CDC 6600

    • Reason:

      • Character sets (like early encodings) fit in 6 bits (64 symbols)

      • Optimized for text and business data

  • 9-bit byte systems :

    • DEC PDP-10

    • Reason:

      • Used 36-bit words, often split into '4 Ă— 9-bit bytes'.

      • 8 bits for data + 1 parity/error bit.

  • Virtually all general-purpose CPUs use 8-bit bytes.

  • This is standardized in practice because:

    • ASCII → extended to 8-bit

    • Hardware, networking, storage all aligned around 8 bits

C/C++ standard
  • sizeof(char) == 1  â†’ this is a byte

  • But a byte is only guaranteed to be at least 8 bits, not exactly 8.

  • Theoretically:

    • 1 byte could be 16 bits

    • Then:

      • char  = 1 byte = 16 bits

      • u8  = 8 bits → not the same thing

  • uint8_t  exists only if the platform actually supports an 8-bit type

  • char  â‰  guaranteed 8 bits

Rust
  • u8  is always 8 bits

  • u8  is effectively the language’s “byte”

  • Rust assumes 8-bit bytes for supported platforms

word, dword, qword

  • A word is the natural data size of a CPU—the size it processes most efficiently.

  • Historically: tied directly to register width.

  • Today: still loosely tied to architecture, but terminology is often legacy-driven.

  • The CPU/ISA defines them. They are not defined by the OS.

  • word   = 16 bits (original 16-bit CPUs like the 8086)

  • dword  = “double word” = 32 bits

  • qword  = “quad word” = 64 bits

  • Example x86 assembly:

    mov eax, dword ptr [rbx]
    

Name confusion

  • These meanings are not universal, just widely adopted due to x86.

  • Architectural definition

    • A word = register size

    • 16-bit CPU → 16-bit word

    • 32-bit CPU → 32-bit word

    • 64-bit CPU → 64-bit word

  • x86 legacy usage :

    • word   = 16 bits (even on 64-bit CPUs)

    • dword  = 32 bits

    • qword  = 64 bits

  • So on modern x86, a “word” is not the natural CPU size anymore

  • Low-level languages usually avoid ambiguity:

    • C/C++:

      • Avoid “word” entirely

      • Use uint32_t , uint64_t , etc.

    • Odin / Rust:

      • Explicit sizes (u8 , u16 , u32 , u64 )

      • No reliance on “word” terminology

Register Efficiency

  • "using a byte (u8) for a 64bit system is not as efficient as using a u64?"

  • Using u8 is not inherently inefficient on a 64-bit system, but there are cases where u64 is faster.

  • A 64-bit CPU (e.g., x86-64, ARM64) is optimized for 64-bit registers, so:

    • Operations on u64:

      • Usually map directly to single instructions

      • Fully utilize registers

    • Operations on u8:

      • Often get promoted to 32 or 64 bits internally

      • May involve extra masking or extension instructions

  • So for pure arithmetic:

    • u64 → often more efficient

    • u8 → sometimes slightly less efficient

  • This is where u8 can actually be more efficient :

    • u8 uses 8Ă— less memory than u64

    • Smaller data:

      • Better cache utilization

      • Fewer cache misses

      • Higher bandwidth efficiency

    • Example:

      • Processing a large array:

      • u8[] → more data fits in cache → often faster overall

      • u64[] → fewer elements per cache line

    • Modern CPUs use SIMD heavily:

      • With u8:

        • You can process 16–64 elements at once (e.g., AVX2/AVX-512)

      • With u64:

        • Only 2–8 elements at once

CPU Architectures

| Term  | Architecture   | Bits   | Notes                  |
| ----- | -------------- | ------ | ---------------------- |
| x86   | Intel (legacy) | 32-bit | Original PC standard   |
| x64   | x86-64         | 64-bit | Extension of x86       |
| amd64 | x86-64         | 64-bit | Same as x64 (AMD name) |
| arm64 | ARM (AArch64)  | 64-bit | Different ISA entirely |

x86 (32-bit)
  • Standardized as 32-bit

  • Key properties:

    • 32-bit registers (EAX, EBX, etc.)

    • 32-bit address space (~4 GB limit)

    • Complex instruction set (CISC)

  • When someone says:

    • “x86 build” → usually means 32-bit binary

  • Why is it called x86 ?

    • Refers to the classic Intel architecture starting from:

      • 8086 → 80286 → 80386 → 80486 → Pentium...

    • Instead of listing all of them, people started referring to the whole family as:

    • “x86” = any processor in the *86 family

    • The “x” is just a wildcard.

x64 / amd64 (64-bit x86)
  • These are the same thing.

  • AMD created the 64-bit extension to x86

  • Called it amd64

  • Intel adopted it (called it Intel 64)

  • So:

  • x64 = amd64 = x86-64

  • Key properties:

    • 64-bit registers (RAX, RBX, etc.)

    • Much larger address space

    • Backward compatible with 32-bit x86

ARM64 (AArch64)
  • Completely different architecture from x86.

  • Designed by ARM Holdings

  • Used in:

    • Phones

    • Tablets

    • Apple Silicon (M1/M2/M3)

    • Many servers now

  • Key properties:

    • 64-bit only (in ARM64 mode)

    • Simpler instruction set (RISC)

    • Different registers and instructions from x86

What the CPU provides vs Typing

  • u8 , u16 , u32 , u64

  • i8 , i16 , i32 , i64

  • int , uint

  • bool

  • pointers

  • struct

  • etc

  • They are ways for a programming language to describe how many bits to use and how to interpret them.

  • The CPU itself only sees bits and instructions.

  • The hardware supports multiple widths—but it does not define “types”

  • A CPU (ISA) defines:

    • Register sizes (e.g., 64-bit registers on x64)

    • Instruction widths (8, 16, 32, 64-bit operations)

    • Operations: add, sub, mul, load/store, etc.

  • Example (x86-64 idea):

    • add rax, rbx  â†’ 64-bit add

    • add eax, ebx  â†’ 32-bit add

    • add al, bl  â†’ 8-bit add

  • The CPU doesn’t inherently “know” signed vs unsigned

    • Difference comes from which instructions you use:

      • Unsigned division → div

      • Signed division → idiv

    • Signed comparisons vs unsigned comparisons → different opcodes

    • Signedness is a semantic layer imposed by the compiler, not a stored property.

  • For pointers, the CPU just sees a number:

    • 32-bit → pointer = 32-bit integer

    • 64-bit → pointer = 64-bit integer

    • There's no special “pointer hardware type”.

  • A struct is a group of fields with layout rules; a memory pattern. The CPU just sees contiguous memory.